1 Exploring purr functions

from:

http://www.rebeccabarter.com/blog/2019-08-19_purrr/

https://adv-r.hadley.nz/functionals.html#purrr-style

1.1 options & settings

options(scipen = 999)

1.2 Libs

library(tidyverse)

1.3 gapminder data

# to download the data directly:

gapminder_orig <- read.csv("https://raw.githubusercontent.com/swcarpentry/r-novice-gapminder/gh-pages/_episodes_rmd/data/gapminder-FiveYearData.csv")

# define a copy of the original dataset that we will clean and play with 
gapminder <- gapminder_orig
names(gapminder)
[1] "country"   "year"      "pop"       "continent" "lifeExp"   "gdpPercap"
dim(gapminder)
[1] 1704    6

1.4 map & modify from purr

1.4.1 class

class(gapminder$country)
[1] "character"
gapminder %>% map_chr(class)
    country        year         pop   continent     lifeExp   gdpPercap 
"character"   "integer"   "numeric" "character"   "numeric"   "numeric" 

modify() returns in the same output format as input, so it is not a suitable choice in this case

1.4.2 n_distinct

gapminder %>% map_dbl(n_distinct)
  country      year       pop continent   lifeExp gdpPercap 
      142        12      1704         5      1626      1704 

1.4.3 class + n_distinct

make sure to pass .x otherwise it will not perform action on columns

Note : we have missed the column names above

Adding column names

defining .x

continents <- continent_year %>% 
                pull(continent) %>% 
                as.character

years <- continent_year %>% 
            pull(year)

.x <- continents[1]
.y <- years[1]
  
gapminder %>% 
  filter(continent == .x,
         year == .y) %>% 
  
  ggplot() +
  geom_point(aes(x = gdpPercap, y = lifeExp, col = country)) +
  ggtitle(paste(.x, .y))

Applying above test code for generic usage object

plot_list <- map2(.x = continents, .y = years,
     .f = ~{gapminder %>% 
  filter(continent == .x,
         year == .y) %>% 
  
  ggplot() +
  geom_point(aes(x = gdpPercap, y = lifeExp, col = country)) +
  ggtitle(paste(.x, .y))})
plot_list[1]
[[1]]

Below I nest the gapminder data by continent.

gapminder_nested <- gapminder %>% 
                      group_by(continent) %>% 
                      nest()

gapminder_nested
gapminder_nested$data[1]
[[1]]
NA

To pull or extract from it by index

gapminder_nested %>% 
  pluck(1)
[1] "Asia"     "Europe"   "Africa"   "Americas" "Oceania" 

To pull or extract data from it by index

gapminder_nested %>% 
  pluck("data",1)

since map returns a lits itself, so we will need to pull from the list

tibble(list_col = list(c(1, 5, 7),
                       5,
                       c(10, 10, 11))) %>% 
    mutate(list_sum = map(.x = list_col, .f = sum)) %>% 
  pull(list_sum)
[[1]]
[1] 13

[[2]]
[1] 5

[[3]]
[1] 31

it could be better to result out a ve tor instead of list

tibble(list_col = list(c(1, 5, 7),
                       5,
                       c(10, 10, 11))) %>% 
  mutate(list_sum = map_dbl(.x = list_col, .f = sum))

How to get mean from column listed tibble data

.x <- gapminder_nested %>% 
        pluck("data", 1)
mean(.x$lifeExp)
[1] 60.0649

Now applying mean function on all column listed tible data

gapminder_nested %>% 
  mutate(avg_lifeExp = map_dbl(data, ~{mean(.x$lifeExp)}))

1.4.4 fitting a linear model for each contines / row

gapminder_nested <- gapminder_nested %>% 
  mutate(lm_obj = map(data, ~lm(lifeExp ~ pop + gdpPercap + year, data = .x)))

gapminder_nested

1.4.4.1 checking linear model for first continent

gapminder_nested %>% pluck("lm_obj", 1)

Call:
lm(formula = lifeExp ~ pop + gdpPercap + year, data = .x)

Coefficients:
(Intercept)          pop   gdpPercap        year 
 -7.833e+02   4.228e-11   2.510e-04   4.251e-01 

1.4.5 Adding Predictions

gapminder_nested <- gapminder_nested %>% 
  mutate(pred = map2(.x = lm_obj, .y = data, function(.x,.y) predict(.x, .y)))

gapminder_nested

can also be written as

1.4.5.1 Calc. correlation pred reps. vs true resp.

gapminder %>% 
  group_by(continent) %>% 
  nest %>% 
  mutate(lm_obj = map(data, ~lm(lifeExp ~ pop + year + gdpPercap, data = .))) %>% 
  mutate(lm_tidy = map(lm_obj, broom::tidy))
gapminder %>% 
  group_by(continent) %>% 
  nest %>% 
  mutate(lm_obj = map(data, ~lm(lifeExp ~ pop + year + gdpPercap, data = .))) %>% 
  mutate(lm_tidy = map(lm_obj, broom::tidy)) %>% 
  ungroup() %>% 
  transmute(continent, lm_tidy) %>% 
  unnest(cols = c(lm_tidy))

1.4.6 split function

this will split the data frame on basisi of factors provided from the variable

gapminder %>% split(gapminder$continent)
$Africa

$Americas

$Asia

$Europe

$Oceania
NA
set.seed(23489)

gapminder_list <- gapminder %>% 
  split(gapminder$continent) %>% 
  map(~sample_n(., 5))

gapminder_list
$Africa

$Americas

$Asia

$Europe

$Oceania
NA

1.4.7 keep()

function to limit/ filter data frame with conditions

discar() is opposite of keep

gapminder_list %>% 
  keep(~{mean(.x$lifeExp) > 70})
$Americas

$Europe

$Oceania
NA
NA

1.4.8 Reduce()

reduce() is designed to combine (reduces) all of the elements of a list into a single object by iteratively applying a binary function (a function that takes two inputs).

reduce(c(1, 2, 3), sum)
[1] 6

1.4.9 accumulate()

also returns the intermediate values.

accumulate(c(1, 2, 3), sum)
[1] 1 3 6

Reduce can be useful in combining columns by using left_join etc. or to do repeated rbind()

1.4.10 Logical statements for lists

every(), some()

For instance to ask whether every continent has average life expectancy greater than 70, you can use every()

gapminder_list %>% every(~{mean(.x$life) > 70 })
[1] FALSE
gapminder_list %>% some(~{mean(.x$life) > 70})
[1] TRUE

1.4.11 has_element()

this is equivalent of %in%

list(1, c(2, 5, 1), "a") %>% has_element("a")
[1] TRUE
---
title: "purr & functions exploration"
output: 
  html_notebook:
    theme: spacelab
    highlight: tango
    df_print: paged
    toc: true
    toc_float: 
      collapsed: false
      smooth_scroll: false
    number_sections: true
    toc_depth: 6
---


<style type="text/css">

body, td {
   font-family: "OCR-B 10 BT";
}
code.r{
  font-family: "OCR-B 10 BT";
}
pre {
  font-family: "OCR-B 10 BT";
}
</style>



# Exploring purr functions

from: 

http://www.rebeccabarter.com/blog/2019-08-19_purrr/

https://adv-r.hadley.nz/functionals.html#purrr-style

## options & settings

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, warning = FALSE, message = FALSE)
```

```{r}
options(scipen = 999)
```


## Libs

```{r}
library(tidyverse)
```


## gapminder data

```{r}
# to download the data directly:

gapminder_orig <- read.csv("https://raw.githubusercontent.com/swcarpentry/r-novice-gapminder/gh-pages/_episodes_rmd/data/gapminder-FiveYearData.csv")

# define a copy of the original dataset that we will clean and play with 
gapminder <- gapminder_orig
```


```{r}
head(gapminder)
```

```{r}
names(gapminder)
```

```{r}
dim(gapminder)
```

## map & modify from purr

### class

```{r}
class(gapminder$country)
```


```{r}
gapminder %>% map_chr(class)
```

```{r}
gapminder %>% modify(class)
```

`modify()` returns in the same output format as input, so it is not a suitable choice in this case


### n_distinct

```{r}
gapminder %>% map_dbl(n_distinct)
```

### class + n_distinct

```{r}
gapminder %>% map_df(~data.frame(class = class(.x),
                                 distinct = n_distinct(.x)))
```

make sure to pass .x otherwise it will not perform action on columns

`Note` : we have missed the column names above

`Adding column names`

```{r}
gapminder %>% map_df(~data.frame(class = class(.x),
                                 distinct = n_distinct(.x)),
                     .id = "column_name")
```

defining .x

```{r}
data.frame(n_distinct = n_distinct(gapminder %>% pluck(1)),
           class = class(gapminder %>% pluck(1)))
```

```{r}
continent_year <- gapminder %>% distinct(continent, year)
continent_year
```


```{r}
continents <- continent_year %>% 
                pull(continent) %>% 
                as.character

years <- continent_year %>% 
            pull(year)
```


```{r}
gapminder %>% 
  filter(continent == continents[1],
         year == years[1]) %>% 
  
  ggplot() +
  geom_point(aes(x = gdpPercap, y = lifeExp, col = country)) +
  ggtitle(paste(continents[1], years[1]))
```

```{r}
.x <- continents[1]
.y <- years[1]
  
gapminder %>% 
  filter(continent == .x,
         year == .y) %>% 
  
  ggplot() +
  geom_point(aes(x = gdpPercap, y = lifeExp, col = country)) +
  ggtitle(paste(.x, .y))
```

Applying above test code for generic usage object

```{r}
plot_list <- map2(.x = continents, 
                  .y = years,
     .f = ~{gapminder %>% 
  filter(continent == .x,
         year == .y) %>% 
  
  ggplot() +
  geom_point(aes(x = gdpPercap, y = lifeExp, col = country)) +
  ggtitle(paste(.x, .y))
       })
```


```{r}
plot_list[1]
```

```{r}
plot_list[[22]]
```

Below I nest the gapminder data by continent.

```{r}
gapminder_nested <- gapminder %>% 
                      group_by(continent) %>% 
                      nest()

gapminder_nested
```

```{r}
gapminder_nested %>% unnest()
```


```{r}
gapminder_nested$data[[1]]
```

```{r}
gapminder_nested$data[[2]]
```


To pull or extract from it by index

```{r}
gapminder_nested %>% 
  pluck(1)
```

To pull or extract data from it by index

```{r}
gapminder_nested %>% 
  pluck("data",1)
```


```{r}
tibble(list_col = list(c(1, 5, 7),
                       5,
                       c(10, 10, 11))) %>% 
    mutate(list_sum = map(.x = list_col, .f = sum))
```

since map returns a lits itself, so we will need to pull from the list

```{r}
tibble(list_col = list(c(1, 5, 7),
                       5,
                       c(10, 10, 11))) %>% 
    mutate(list_sum = map(.x = list_col, .f = sum)) %>% 
  pull(list_sum)
```

it could be better to result out a ve tor instead of list

```{r}
tibble(list_col = list(c(1, 5, 7),
                       5,
                       c(10, 10, 11))) %>% 
  mutate(list_sum = map_dbl(.x = list_col, .f = sum))
```

How to get mean from column listed tibble data

```{r}
.x <- gapminder_nested %>% 
        pluck("data", 1)
```


```{r}
mean(.x$lifeExp)
```
Now applying mean function on all column listed tible data

```{r}
gapminder_nested %>% 
  mutate(avg_lifeExp = map_dbl(data, ~{mean(.x$lifeExp)}))
```

### fitting a linear model for each contines / row

```{r}
gapminder_nested <- gapminder_nested %>% 
  mutate(lm_obj = map(data, ~lm(lifeExp ~ pop + gdpPercap + year, data = .x)))

gapminder_nested
```

#### checking linear model for first continent

```{r}
gapminder_nested %>% pluck("lm_obj", 1)
```

### Adding Predictions

```{r}
gapminder_nested <- gapminder_nested %>% 
  mutate(pred = map2(.x = lm_obj, .y = data, function(.x,.y) predict(.x, .y)))

gapminder_nested
```

can also be written as 

```{r}
gapminder_nested %>% 
  mutate(pred = map2(lm_obj, data, function(.lm, .data) predict(.lm, .data)))

```


#### Calc. correlation pred reps. vs true resp.

```{r}
gapminder_nested <- gapminder_nested %>% 
  mutate(cor = map2_dbl(pred, data, function(.pred, .data) cor(.pred, .data$lifeExp)))

gapminder_nested
```

```{r}
gapminder %>% 
  group_by(continent) %>% 
  nest %>% 
  mutate(lm_obj = map(data, ~lm(lifeExp ~ pop + year + gdpPercap, data = .))) %>% 
  mutate(lm_tidy = map(lm_obj, broom::tidy))
```


```{r}
gapminder %>% 
  group_by(continent) %>% 
  nest %>% 
  mutate(lm_obj = map(data, ~lm(lifeExp ~ pop + year + gdpPercap, data = .))) %>% 
  mutate(lm_tidy = map(lm_obj, broom::tidy)) %>% 
  ungroup() %>% 
  transmute(continent, lm_tidy) %>% 
  unnest(cols = c(lm_tidy))
```


### split function

this will split the data frame on basisi of factors provided from the variable

```{r}
gapminder %>% split(gapminder$continent)
```


```{r}
set.seed(23489)

gapminder_list <- gapminder %>% 
  split(gapminder$continent) %>% 
  map(~sample_n(., 5))

gapminder_list
```


### keep()

function to limit/ filter data frame with conditions

discar() is opposite of keep

```{r}
gapminder_list %>% 
  keep(~{mean(.x$lifeExp) > 70})
  
```


### Reduce()

`reduce()` is designed to combine (reduces) all of the elements of a list into a single object by iteratively applying a binary function (a function that takes two inputs).

```{r}
reduce(c(1, 2, 3), sum)
```

### accumulate()

also returns the intermediate values.

```{r}
accumulate(c(1, 2, 3), sum)
```

`Reduce` can be useful in combining columns by using left_join etc. or to do repeated `rbind()`

```{r}
gapminder_list %>% 
  reduce(rbind)
```



### Logical statements for lists

`every()`, `some()`

For instance to ask whether every continent has average life expectancy greater than 70, you can use every()

```{r}
gapminder_list %>% every(~{mean(.x$life) > 70 })
```


```{r}
gapminder_list %>% some(~{mean(.x$life) > 70})
```


### has_element()

this is equivalent of %in%

```{r}
list(1, c(2, 5, 1), "a") %>% has_element("a")
```















































